week 5

Homework

· Calculate means, SDs, and confidence intervals (89%, 95%, 99%) for one continuous and one binary variable. Interpret the confidence intervals.

· Do two simulations (one continuous, one binary) to show that simulation-based standard deviations of the estimate converge to the formula-based standard error of the sampling distribution. Explain the result to show you understand what you did.

Continuous Variable

The continuous variable I choose is the life Expectancy from dataset Gapminder.

Mean of LifeExp = 59.47

Standard Deviation of lifeExp = 12.92

Standard Error = \(\frac{SD}{\sqrt{n}}\) = 0.3129845

Confidence Interval

89% confidence interval = (58.97 , 59.97)

For the population life expectancy, the probability that the expectancy is within range 58.97 to 59.97 years is 0.89.

95% confidence interval = (58.86 , 60.09)

For the population life expectancy, the probability that the expectancy is within range 58.86 to 60.09 years is 0.95.

99% confidence interval = (58.67 , 60.27)

For the population life expectancy, the probability that the expectancy is within range 58.67 to 60.27 is 0.99.

Continuous Variable Simulation

If we randomly select 150 units from eligible total population, and repeat the selection for80, 580 and 1080 times. We will get 80, 580, 1080 samples with 150 units each, we calculate mean of each sample group and get a distribution of means of each sample group.

80 times

The mean of sample means is 59.55

The standard deviation of the sample means is 0.96

SE = \(\frac{0.96}{\sqrt{80}}\) = 0.107

580 times

The mean of sample means is 59.44

The standard deviation of the sample means is 1.04

SE = \(\frac{1.04}{\sqrt{580}}\) = 0.043

1080 times

The mean of sample means is 59.44

The standard deviation of the sample means is 1

SE = \(\frac{1}{\sqrt{1080}}\) = 0.030

Interpret

The larger the sample size, the smaller the standard error.

SE80times > SE580times > SE1080times.

This demonstrates that as the sample size increases, the estimated mean more closely converges to the true population mean.

Binary Variable

Suppose we have a partial population of 10000, among which 8000 are older or equal to 18 years old and the rest 2000 are minors. We assign older with 1, and minors with 0.

Then we create a Bernoulli distribution with p = 0.8.

Mean = 0.8

For variable,“Adults or Minors”, Var[X] = p(1-p)= 0.8*0.2 = 0.16

SD[X] = \(\sqrt{p(1-p)}\) = \(\sqrt{0.16}\) = 0.4

SE = \(\frac{SD}{\sqrt{n}}\) = 0.4/100 = 0.004

Confidence Interval

89% confidence interval = mean +/- 1.598SE = 0.8 +/- 0.006 = (0.794, 0.806)

The probability of the true proportion of adults in the population to be within range 0.794 to 0.806 is 0.89.

95% confidence interval = mean +/- 1.98SE = 0.8 +/- 0.008 = (0.792, 0.808)

The probability of the true proportion of adults in the population to be within range 0.792 to 0.808 is 0.95.

99% confidence interval = mean +/- 2.58SE = 0.8 +/- 0.010 = (0.790, 0.810)

The probability of the true proportion of adults in the population to be within range 0.790 to 0.810 is 0.99.

Binary Variable Simulation

If we randomly select 500 people from the population and count p for this sample. We draw 20, 200, 2000 times.

20 times

The sample mean is 0.8039.

Standard deviation is 0.0140447

Standard Error is \(\frac{0.0140}{\sqrt{20}}\) = 0.0031

200 times

The sample mean is 0.80064.

Standard deviation is 0.0169525

Standard Error is \(\frac{0.0170}{\sqrt{200}}\) = 0.0012

2000 times

The sample mean is 0.800022.

Standard deviation is 0.0175954

Standard Error is \(\frac{0.0176}{\sqrt{2000}}\) = 0.0004

Intepret

The larger the sample size, the smaller the standard error.

SE20times > SE200times > SE2000times.

This demonstrates that as the sample size increases, the estimated mean more closely converges to the true population mean.